Fix performance when checking dataset version file count#12494
Conversation
landreev
left a comment
There was a problem hiding this comment.
Looks good; extra logic for guest user helps.
This comment has been minimized.
This comment has been minimized.
| @@ -257,15 +260,17 @@ public boolean canIssueDeleteDatasetCommand(DvObject dvo){ | |||
| // PUBLISH DATASET | |||
| public boolean canIssuePublishDatasetCommand(DvObject dvo){ | |||
There was a problem hiding this comment.
These are in the permissionsWrapper class - shouldn't the results also be cached for the life of the bean?
There was a problem hiding this comment.
Why would we cache the results? It would change when a file upload completes
There was a problem hiding this comment.
If caching isn't appropriate, should the method be in the PermissionServiceBean instead? The *Wrapper classes are viewscoped and should just have cached info (probably not enforced). That would make the call accessable via API as well.
There was a problem hiding this comment.
Generally, we want everything that is called from JSF rendered="#{...}" attributes cached (because these are evaluated 7 times). Probably a good idea even when there is no reason to expect the task to be expensive.
In this specific case, I am actually not sure off the top of my head about the "would change when a file upload completes"... PermissionsWrapper is @ViewScoped. Does that mean that it will be a new "View" when the page re-renders itself after an upload completes? Or the same one?
There was a problem hiding this comment.
Would be easy to test though.
There was a problem hiding this comment.
Good point - I'm not sure w/o digging - @viewScoped would stay as long as we're just doing ajax updates? We also cache in the DatasetPage at times - that would be a place where the cache could be emptied when a file upload completes.
There was a problem hiding this comment.
looked into it some more
-
From a quick test, caching that info within a View does NOT prevent the page from properly updating itself once one or more files has been uploaded (that re-rendering apparently counts as a new View).
-
Yes, we DO want to cache this (on top of caching the result of
permissionService...canIssue(command)that we are already doing there). JSF is calling these 2 methods over and over incessantly. Having that NativeQuery executed every time feels wrong. -
I mentioned in the perf. issue that "The query count goes from 9K+ down to 813 ... This is ~100 more than it has been consistently between the last few releases, but ..." That was from a test of the first iteration of this PR, that was not checking for
|| u instanceof GuestUser. The extra 100 queries were solely from the native query and whatever it instantiated in thedatasetVersionService.hasFiles()performed on the 10 underlying datasets. Re-testing the PR in its current state brought that query count down to 715. (it was 714 in 6.10.1).
It will be worse for the dataset page and a logged in browser user.
... So yes, we want to cache.
We'll wrap it up tomorrow.
There was a problem hiding this comment.
FYI, this was my quick experiment last night: #12495 (this is a pr into this branch).
I am not positive if this is the best way of handling this.
I'm hoping to be able to focus on other work today.
There was a problem hiding this comment.
One last comment:
What's truly annoying, is that the only serious performance degradation this was causing was on the dataverse.xhtml page; where, from what I understand, the check on the number of files may not even be relevant. (the "can publish" check there seems to be a proxy for "can see [some] curation info" - ?)
In dataset.xhtml, where we do need to know whether the dataset has files or not, the original ...Version.getFileMetadatas().size() was NOT even a problem, I don't think - because the filemetadatas are already instantiated, via a monster findDeep() query.
... we should avoid dumping any excessive amounts of work into this, in other words.
There was a problem hiding this comment.
(although permissionsWrapper.canIssuePublishDatasetCommand() is also called from FilePage.java; it'll definitely help to avoid unnecessarily instantiating all the files in the parent dataset in there as well)
| @@ -257,15 +260,17 @@ public boolean canIssueDeleteDatasetCommand(DvObject dvo){ | |||
| // PUBLISH DATASET | |||
| public boolean canIssuePublishDatasetCommand(DvObject dvo){ | |||
| User u = session.getUser(); | |||
There was a problem hiding this comment.
doesn't getUser return a GuestUser? i.e. getAuthenticatedUser() is needed for the check?
This comment has been minimized.
This comment has been minimized.
(I meant to add that as a comment; it would be great if somebody else approved it)
Test Results403 tests 388 ✅ 33m 27s ⏱️ Results for commit 9fac0bb. ♻️ This comment has been updated with latest results. |
644795a to
1a19e93
Compare
This comment has been minimized.
This comment has been minimized.
| <h:outputText value="#{bundle['embargoed']}" styleClass="label label-primary" rendered="#{SearchIncludeFragment.isActivelyEmbargoed(result)}"/> | ||
| <h:outputText value="#{bundle['retentionExpired']}" styleClass="label label-warning" rendered="#{SearchIncludeFragment.isRetentionExpired(result)}"/> | ||
| <h:outputText value="#{DatasetUtil:getLocaleCurationStatusLabelFromString(result.curationStatus)}" styleClass="label label-info curation-status" rendered="#{SearchIncludeFragment.canSeeCurationStatus(result.entityId)}"/> | ||
| <h:outputText value="#{DatasetUtil:getLocaleCurationStatusLabelFromString(result.curationStatus)}" styleClass="label label-info curation-status" rendered="#{SearchIncludeFragment.canSeeCurationStatus(result.entity)}"/> |
There was a problem hiding this comment.
entity is only non-null if retrieveEntities was set in the search call - see
- I don't know if that's the case in all the uses of this fragment or not.There was a problem hiding this comment.
added a check for null entity
There was a problem hiding this comment.
I thought we had given up on ever being able to avoid instantiating that entity when running searches from within the dataverse page.
But yes, since that retrieveEntities argument exists, it is prudent to assume the entity can be null, as in the latest commit.
There was a problem hiding this comment.
this fix also saves the entity in the result so it should only look it up once
There was a problem hiding this comment.
I'd suggest adding a check in the render logic to only go into all of this if result.curationStatus is not null/blank. Hopefully relatively few datasets will have one (only draft, only if they curation Status labels are used, etc.).
In general, I'm not a big fan of hiding the logic to get an entity off in this corner case - if the entity at this call is null, code moved from after to before this call could suddenly fail with a null entity (or also have to implement a check). (Or, more fun, if you add the curationStatus null check, some unrelated code could succeed when there is a curationStatus and fail when there isn't.)
It may be out of scope for a quick fix (is a quick fix needed if you check for curationStatus being null?), but it might be better to only getEntity() when needed and drop the retrieveEntities flag which gets them for every result even if some results never use it.
There was a problem hiding this comment.
added check so the canSeeCurationStatus only gets called if there is a curation status.
reverted back to sending the dataset id and looking up each time. There were 5 calls to get the dataset but only for 1 search result entity id.
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| <h:outputText value="#{bundle['embargoed']}" styleClass="label label-primary" rendered="#{SearchIncludeFragment.isActivelyEmbargoed(result)}"/> | ||
| <h:outputText value="#{bundle['retentionExpired']}" styleClass="label label-warning" rendered="#{SearchIncludeFragment.isRetentionExpired(result)}"/> | ||
| <h:outputText value="#{DatasetUtil:getLocaleCurationStatusLabelFromString(result.curationStatus)}" styleClass="label label-info curation-status" rendered="#{SearchIncludeFragment.canSeeCurationStatus(result.entityId)}"/> | ||
| <h:outputText value="#{DatasetUtil:getLocaleCurationStatusLabelFromString(result.curationStatus)}" styleClass="label label-info curation-status" rendered="#{!empty result.curationStatus and SearchIncludeFragment.canSeeCurationStatus(result.entityId)}"/> |
There was a problem hiding this comment.
I totally support adding the curationStatus check to the rendering logic.
... but I am confused as to why we are back to passing result.entityId, and looking up the dvobject in the SearchIncludeFragment.java method - ? ... even knowing that in most cases the lookups will be avoided by the curationStatus checks (although I am hearing we'll be enabling these curation labels at Harvard too).
For 6.11 I really wanted to suggest the opposite: keep passing the result, or result.entity, and simply return false if the entity is null. The entity should be there in normal operations. There are other methods in the fragment relying on result.entity.
... @qqmyers says he's ok either way via slack. I know it's already taken longer than necessary. But it just feels wrong seeing those lookups back in the code, after having seen the code that avoided them - no?
I support revisiting and cleaning up how we handle solr search results past 6.11 also.
|
📦 Pushed preview images as 🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name. |
landreev
left a comment
There was a problem hiding this comment.
Will re-test and merge this long-suffering thing momentarily.
|
Merging; will add some numbers in the 6.11 perf. issue. |
What this PR does / why we need it: Performance issues with sql call
#12284
Which issue(s) this PR closes:
Special notes for your reviewer:
Suggestions on how to test this:
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?:
Additional documentation: